Detecting and labeling speakers on overlapping speech using vector taylor series
نویسندگان
چکیده
Successfully modeling overlapping speech is a crucial step towards improving the performance of current speaker diarization systems. In this direction, we present ongoing work on a novel Multi-Class Vector Taylor Series (MC-VTS) approach that models overlapping speech from knowledge of the individual speaker models and the feature extraction process. We explore several variants of the MC-VTS technique that aim at modeling overlapping speech more precisely. Bootstrapping the algorithm with both oracle and diarization output segmentations, we show the potential of this approach in terms of overlapping speech detection and speaker labeling performances through a set of experiments on far-field microphone meeting data.
منابع مشابه
Modeling Overlapping Speech using Vector Taylor Series
Current speaker diarization systems typically fail to successfully assign multiple speakers speaking simultaneously. According to previous studies, overlapping errors account for a large proportion of the total errors in multi-party speech diarization. In this work, we propose a new approach using Vector Taylor Series (VTS) to obtain overlapping speech models assuming individual speaker models ...
متن کاملDetection of Overlapping Speech in Meetings Using Support Vector Regression
A method of detecting overlapping speech in meetings is proposed in this paper. The eigenvalue distribution of the spatial correlation matrix reflects information on the relative power of sound sources. By applying Support Vector Regression to a set of input eigenvalues, the relative power of sources is estimated. Based on this, overlapping speech is then detected. The proposed method was evalu...
متن کاملAn Acoustic Study of Emotivity-Prosody Interface in Persian Speech Using the Tilt Model
This paper aims to explore some acoustic properties (i.e. duration and pitch amplitude of speech) associated with three different emotions: anger, sadness and joy against neutrality as a reference point, all being intentionally expressed by six Persian speakers. The primary purpose of this study is to find out if there is any correspondence between the given emotions and prosody patterning in P...
متن کاملAcoustic Analysis of Whispered Speech for Phoneme and Speaker Dependency
Whisper is used by speakers in certain circumstances to protect personal information. Due to the differences in production mechanisms between neutral and whispered speech, there are considerable differences between the spectral structure of neutral and whispered speech, such as formant shifts and shifts in spectral slope. This study analyzes the dependency of these differences on speakers and p...
متن کاملDetecting overlapping speech with long short-term memory recurrent neural networks
Detecting segments of overlapping speech (when two or more speakers are active at the same time) is a challenging problem. Previously, mostly HMM-based systems have been used for overlap detection, employing various different audio features. In this work, we propose a novel overlap detection system using Long Short-Term Memory (LSTM) recurrent neural networks. LSTMs are used to generate framewi...
متن کامل